Skip to content

chore(ci): refine Claude Code Action workflows#4

Merged
Sardor-M merged 1 commit into
mainfrom
chore/refine-claude-workflows
May 11, 2026
Merged

chore(ci): refine Claude Code Action workflows#4
Sardor-M merged 1 commit into
mainfrom
chore/refine-claude-workflows

Conversation

@Sardor-M
Copy link
Copy Markdown

@Sardor-M Sardor-M commented May 8, 2026

Summary

Replace the default-boilerplate @claude handler and the plugin-based auto-review with a tailored pair adapted from a production-tested setup. Same two files, materially different behavior.

claude.yml (the @claude mention handler)

  • Concurrency group keyed on issue/PR number. Rapid @claude mentions on the same PR no longer spawn parallel agent runs.
  • Write permissions on pull-requests and issues so the agent can post comments and edit PR descriptions (was: read-only).
  • timeout-minutes: 30 to bound runaway jobs.
  • Model pinned to claude-sonnet-4-6.
  • Tool allowlist covers the toolchain agents will actually need: cargo check/build/test/clippy/fmt, npm run *, npx prettier/eslint, yarn *, plus gh pr/api, git diff/log/show, and Read/Edit/Write/Glob/Grep.

claude-code-review.yml (auto PR review)

Replace the marketplace code-review plugin with an inline reviewer:

  • 4000-line cost guard. Diffs above the threshold are skipped with a comment pointing reviewers at @claude for targeted asks. Keeps spend predictable (~$0.10–0.20 per run on Sonnet 4.6).
  • Lockfile/asset filtering. pnpm-lock, package-lock, yarn.lock, Cargo.lock, images, and source maps are stripped from the diff before review.
  • File context for changed code. Both .ts/.tsx/.js/.jsx and .rs files appear as CHANGED (up to 600 lines each, with truncation markers so the reviewer doesn't false-positive on missing identifiers). For TS/JS files, imports are resolved transitively to add RELATED files (200 lines each). Rust mod resolution is non-trivial, so .rs files appear without related-file walking.
  • CoreMQ-tailored system prompt. Calls out the project-specific bug classes worth flagging: AdminCommand wiring (oneshot reply requirement), packet_id panic class, blocking syscalls in async, channel deadlocks, hardcoded colors, and the en.json / ko.json / uz.json i18n triplet rule. Tells the reviewer not to invent issues from truncated context.
  • Cost reporting. Each review comment includes diff/context line counts, token usage breakdown, and estimated USD cost.
  • Concurrency cancels in-progress reviews when a new push lands on the same PR.

What it looks like in practice

A PR review comment will look like:

## 🤖 Claude Code Review

### Summary
<2-3 sentences>

### Issues
- [major] server/coremq-server/src/engine/engine.rs:142 — Match arm drops oneshot reply on Err path; controller will hang.
  Fix: Send the error response back via `let _ = reply.send(Err(...));` before returning.

---
*Reviewed by claude-sonnet-4-6 · diff: 234 lines · context: 812 lines · tokens: 4,219 in / 287 out · cost: $0.0167 · tag `@claude` to ask follow-up questions*

Test plan

  • Confirm ANTHROPIC_API_KEY is present in repo Actions secrets (already required for the existing workflows — same secret name).
  • Open this PR and verify claude-code-review.yml posts a review comment within ~1 minute of the push.
  • Tag @claude in a comment on this PR and verify claude.yml responds.
  • (optional) Push a 4000+ line diff somewhere and verify the cost guard fires with the skip-notice comment.

Notes

  • This PR replaces, doesn't add. Same two filenames, same secret. No new infrastructure.
  • The workflow scope is required to land changes under .github/workflows/ — already on this token.
  • Companion PR (skills + agent docs) lands separately.

Replace the default-boilerplate @claude handler and the plugin-based
auto-review with a tailored pair:

claude.yml
  - Add concurrency group keyed on issue/PR number so rapid @claude
    mentions don't spawn parallel agent runs.
  - Bump permissions to write on pull-requests/issues so the agent
    can post comments and edit PR descriptions.
  - Set timeout-minutes: 30.
  - Pin model to claude-sonnet-4-6 and pass an allowlist covering the
    cargo / clippy / fmt / test toolchain plus npm / yarn / eslint /
    prettier / gh, alongside Read/Edit/Write/Glob/Grep.

claude-code-review.yml
  - Replace the marketplace plugin invocation with an inline diff +
    file-context build + curl-to-Anthropic flow that returns a single
    review comment with token usage and cost line.
  - 4000-line cost guard skips oversized diffs with an explanatory
    comment, keeping spend predictable.
  - Filter out lockfiles (pnpm/npm/yarn/Cargo.lock) and binary assets
    from the diff.
  - Include both .ts/.tsx/.js/.jsx and .rs in the changed-file context.
    Walk imports for TS/JS only; Rust files appear as CHANGED without
    related-file resolution (Rust mod resolution is non-trivial).
  - Tailored system prompt covers CoreMQ specifics: AdminCommand
    wiring, oneshot reply requirement, packet_id panic class,
    blocking-in-async, channel deadlocks, theme tokens, and the
    en/ko/uz i18n triplet rule.
  - Concurrency cancels in-progress reviews when a new push lands.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

🤖 Claude Code Review

Summary

Two CI workflow files replacing boilerplate with a production-tuned pair. The logic is mostly sound but has a few real issues worth fixing.

Issues

Majorclaude-code-review.yml:68 — Lockfile/asset filtering only skips the single diff --git header line matching those patterns, leaving the rest of the hunk (the actual removed/added lines) in $DIFF. The grep -v approach needs to strip entire file sections, not just the diff --git line.
Fix: use filterdiff --exclude= (from patchutils) or post-process with awk to drop full hunks for matched paths.

Majorclaude-code-review.yml:204max_tokens: 1024 is very tight for an 800-word review budget; a complex diff will hit the limit and return a truncated review with stop_reason: max_tokens, which then gets posted silently as a partial comment with no user-visible warning.
Fix: raise to 4096 and add an explicit check on stop_reason == "max_tokens" to append a truncation notice.

Minorclaude-code-review.yml:79 — Empty-diff check (if [ -z "$RAW_DIFF" ]) comes after the DIFF=$(... head -n 8000) assignment, so $DIFF is already computed; order is fine, but $DIFF_LINES will be 0 on empty input and the cost guard would trigger first (0 < 4000 = false… wait, 0 is not > 4000), so the empty check is actually dead if RAW_DIFF is empty because DIFF_LINES=0 passes the guard. The real problem: an empty diff skips review without error but the Post review comment step still runs because skip is set to true and steps.diff.outputs.skip != 'true' guards it — this is fine. Actually the empty check sets skip=true correctly; the only issue is the check should precede the wc -l on filtered diff to avoid the misleading "0 filtered lines" log noise. Minor ordering nit.

Minorclaude.yml:concurrencygroup uses || in a shell expression sense but this is a GitHub Actions expression: ${{ github.event.issue.number || github.event.pull_request.number }}. If both are absent (shouldn't happen given the if guard, but e.g. on pull_request_review events github.event.issue is null), the group collapses to claude-agent- (empty suffix), serializing unrelated PRs.
Fix: use ${{ github.event.issue.number || github.event.pull_request.number || github.run_id }}.

Nits

  1. ANTHROPIC_API_KEY is exposed as an env var on the review step and interpolated directly into jq/curl — fine since it only goes to the curl header, but storing the key in /tmp/payload.json (which is written to disk) alongside the diff content is worth noting for self-hosted runners.
  2. The /* */ comment convention check in the system prompt references Rust block comments, but claude.yml uses # line comments per YAML convention — no issue, just confirming it's intentional.

Reviewed by claude-sonnet-4-6 · diff: 384 lines · context: 0 lines · tokens: 6819 in / 809 out · cost: $0.032592 · tag to ask follow-up questions

@Sardor-M
Copy link
Copy Markdown
Author

Sardor-M commented May 8, 2026

🤖 Claude Code Review

Summary

Two CI workflow files replacing boilerplate with a production-tuned pair. The logic is mostly sound but has a few real issues worth fixing.

Issues

Majorclaude-code-review.yml:68 — Lockfile/asset filtering only skips the single diff --git header line matching those patterns, leaving the rest of the hunk (the actual removed/added lines) in $DIFF. The grep -v approach needs to strip entire file sections, not just the diff --git line. Fix: use filterdiff --exclude= (from patchutils) or post-process with awk to drop full hunks for matched paths.

Majorclaude-code-review.yml:204max_tokens: 1024 is very tight for an 800-word review budget; a complex diff will hit the limit and return a truncated review with stop_reason: max_tokens, which then gets posted silently as a partial comment with no user-visible warning. Fix: raise to 4096 and add an explicit check on stop_reason == "max_tokens" to append a truncation notice.

Minorclaude-code-review.yml:79 — Empty-diff check (if [ -z "$RAW_DIFF" ]) comes after the DIFF=$(... head -n 8000) assignment, so $DIFF is already computed; order is fine, but $DIFF_LINES will be 0 on empty input and the cost guard would trigger first (0 < 4000 = false… wait, 0 is not > 4000), so the empty check is actually dead if RAW_DIFF is empty because DIFF_LINES=0 passes the guard. The real problem: an empty diff skips review without error but the Post review comment step still runs because skip is set to true and steps.diff.outputs.skip != 'true' guards it — this is fine. Actually the empty check sets skip=true correctly; the only issue is the check should precede the wc -l on filtered diff to avoid the misleading "0 filtered lines" log noise. Minor ordering nit.

Minorclaude.yml:concurrencygroup uses || in a shell expression sense but this is a GitHub Actions expression: ${{ github.event.issue.number || github.event.pull_request.number }}. If both are absent (shouldn't happen given the if guard, but e.g. on pull_request_review events github.event.issue is null), the group collapses to claude-agent- (empty suffix), serializing unrelated PRs. Fix: use ${{ github.event.issue.number || github.event.pull_request.number || github.run_id }}.

Nits

  1. ANTHROPIC_API_KEY is exposed as an env var on the review step and interpolated directly into jq/curl — fine since it only goes to the curl header, but storing the key in /tmp/payload.json (which is written to disk) alongside the diff content is worth noting for self-hosted runners.
  2. The /* */ comment convention check in the system prompt references Rust block comments, but claude.yml uses # line comments per YAML convention — no issue, just confirming it's intentional.

Reviewed by claude-sonnet-4-6 · diff: 384 lines · context: 0 lines · tokens: 6819 in / 809 out · cost: $0.032592 · tag to ask follow-up questions

@claude review this

@claude
Copy link
Copy Markdown

claude Bot commented May 8, 2026

Claude finished @Sardor-M's task in 3m 10s —— View job


Claude Code Review

  • Read both changed workflow files
  • Verify the prior bot review's claims
  • Check for additional issues
  • Post review

🤖 Claude Code Review

Summary

Solid direction overall — the inline reviewer is more cost-controlled and CoreMQ-tailored than the marketplace plugin it replaces. The earlier bot review on this thread caught the two material bugs; I'll confirm them concretely and add a few it missed.

Confirming the prior bot's findings

Major — claude-code-review.yml:43 — Lockfile/asset filter is broken. CONFIRMED.
The regex ^diff --git.*(pnpm-lock\.yaml|...) only matches the single header line. A pnpm-lock.yaml change in a unified diff produces diff --git, index, ---, +++, @@, and the actual +/- lines — grep -v strips only the first. The remaining hunk lines (which dominate, since lockfiles are huge) stay in $DIFF and consume both context and tokens. The user-facing impact isn't just review-quality noise — a 3500-line PR that's mostly pnpm-lock.yaml churn passes the 4000-line cost guard and burns the budget reviewing irrelevant content.

Awk drops the patchutils dependency:

DIFF=$(echo "$RAW_DIFF" | awk '
  /^diff --git/ { skip = ($0 ~ /(pnpm-lock\.yaml|package-lock\.json|yarn\.lock|Cargo\.lock|\.(png|jpg|svg|ico|map))( |$)/) }
  !skip
' | head -n 8000)

Major — claude-code-review.yml:181max_tokens: 1024 is under-provisioned. CONFIRMED.
The system prompt explicitly budgets "under 800 words for large diffs." 800 English words ≈ 1100–1300 output tokens, so a thorough review on a non-trivial PR will hit the cap and stop mid-sentence. There's no stop_reason check at line 197 — it's only logged, not surfaced. Raise to 4096 and gate on stop_reason == "max_tokens":

STOP_REASON=$(echo "$RESPONSE" | jq -r '.stop_reason // ""')
if [ "$STOP_REASON" = "max_tokens" ]; then
  REVIEW="${REVIEW}

> ⚠️ Review truncated at max_tokens — re-tag \`@claude\` for the rest."
fi

Minor — claude.yml:15 — concurrency group can collapse. CONFIRMED.
For a pull_request_review event triggered without an enclosing PR comment, github.event.issue.number is null and github.event.pull_request.number resolves correctly — but if neither is present (defensive case), the group becomes claude-agent- and serializes unrelated events globally. Add a fallback: claude-agent-${{ github.event.issue.number || github.event.pull_request.number || github.run_id }}.

Minor — claude-code-review.yml:47 — empty-diff check ordering.
Agreed it's a cosmetic ordering nit only. Functionally correct.

Additional issues the prior review missed

Minor — claude-code-review.yml:174-189 — no prompt caching, despite cost-counting code that supports it.
Lines 211, 215–216 read and price cache_creation_input_tokens / cache_read_input_tokens, but the request payload never sets cache_control. Those values will always be 0. The system prompt is ~1.5k tokens and identical every run — adding a cache_control: { type: "ephemeral" } breakpoint on the system block cuts the input cost ~90% on the second-and-onward run within a 5-minute window. Worth doing since the cost-reporting infrastructure is already there:

system: [
  { type: "text", text: "Concise code reviewer for the CoreMQ project — ...", cache_control: { type: "ephemeral" } }
]

Minor — claude-code-review.yml:136-138 — TS import regex matches string literals.
s/.*from [\"']([^\"']+)[\"'].*/\1/p matches any line containing from "..." — including console.log("imported from './foo'") or comments. False positives are mostly harmless (resolution fails silently) but inflate context size on files with chatty logging. Anchoring to a leading-keyword pattern helps: s/^[[:space:]]*(import|export)[[:space:]].*from[[:space:]]+["']([^"']+)["'].*/\2/p.

Minor — claude-code-review.yml:108src/* alias hardcoded to client/.
Fine for this repo's layout, but if the CoreMQ tree ever grows a second TS package (e.g. a shared lib), src/Foo will silently resolve to client/src/Foo and miss real targets. Worth a comment at minimum so the assumption is discoverable.

Nit — claude-code-review.yml:43 — regex matches .png mid-path.
\.png|\.jpg|... will match diff --git a/docs/screenshots.png.txt b/... (pathological but possible). Anchor with a word boundary or end-of-token: \.(png|jpg|svg|ico|map)( |$). The awk replacement above already does this.

Nit — claude.yml:48 — allowlist patterns lack a separator before *.
Bash(cargo check*) matches cargo check, cargo checkout, cargo checksum-anything. The intent is clearly cargo check followed by args, so Bash(cargo check *) is more precise. Same for cargo build*, cargo test*, etc. Likely benign in practice — there's no cargo checkout — but worth tightening.

On the prior bot's nits

  1. The /tmp/payload.json concern is real but lower-stakes than they implied — ANTHROPIC_API_KEY itself is never written to the file (it's only in the curl header). What's on disk is the diff + system prompt + key-less payload. On GitHub-hosted runners the runner is ephemeral; on self-hosted it's worth knowing the diff hits /tmp before the API call.
  2. The Rust /* */ vs YAML # comment thing is not an issue — the system prompt's convention applies to files under review, not to the workflow YAML itself.

Fix the lockfile filter → · Raise max_tokens and handle truncation → · Add prompt caching →


· branch chore/refine-claude-workflows

@Sardor-M Sardor-M requested a review from otabek05 May 8, 2026 07:23
@Sardor-M Sardor-M self-assigned this May 8, 2026
@Sardor-M Sardor-M added documentation Improvements or additions to documentation enhancement New feature or request labels May 8, 2026
@Sardor-M
Copy link
Copy Markdown
Author

Sardor-M commented May 8, 2026

Summary

Replace the default-boilerplate @claude handler and the plugin-based auto-review with a tailored pair adapted from a production-tested setup. Same two files, materially different behavior.

claude.yml (the @claude mention handler)

  • Concurrency group keyed on issue/PR number. Rapid @claude mentions on the same PR no longer spawn parallel agent runs.
  • Write permissions on pull-requests and issues so the agent can post comments and edit PR descriptions (was: read-only).
  • timeout-minutes: 30 to bound runaway jobs.
  • Model pinned to claude-sonnet-4-6.
  • Tool allowlist covers the toolchain agents will actually need: cargo check/build/test/clippy/fmt, npm run *, npx prettier/eslint, yarn *, plus gh pr/api, git diff/log/show, and Read/Edit/Write/Glob/Grep.

claude-code-review.yml (auto PR review)

Replace the marketplace code-review plugin with an inline reviewer:

  • 4000-line cost guard. Diffs above the threshold are skipped with a comment pointing reviewers at @claude for targeted asks. Keeps spend predictable (~$0.10–0.20 per run on Sonnet 4.6).
  • Lockfile/asset filtering. pnpm-lock, package-lock, yarn.lock, Cargo.lock, images, and source maps are stripped from the diff before review.
  • File context for changed code. Both .ts/.tsx/.js/.jsx and .rs files appear as CHANGED (up to 600 lines each, with truncation markers so the reviewer doesn't false-positive on missing identifiers). For TS/JS files, imports are resolved transitively to add RELATED files (200 lines each). Rust mod resolution is non-trivial, so .rs files appear without related-file walking.
  • CoreMQ-tailored system prompt. Calls out the project-specific bug classes worth flagging: AdminCommand wiring (oneshot reply requirement), packet_id panic class, blocking syscalls in async, channel deadlocks, hardcoded colors, and the en.json / ko.json / uz.json i18n triplet rule. Tells the reviewer not to invent issues from truncated context.
  • Cost reporting. Each review comment includes diff/context line counts, token usage breakdown, and estimated USD cost.
  • Concurrency cancels in-progress reviews when a new push lands on the same PR.

What it looks like in practice

A PR review comment will look like:

## 🤖 Claude Code Review

### Summary
<2-3 sentences>

### Issues
- [major] server/coremq-server/src/engine/engine.rs:142 — Match arm drops oneshot reply on Err path; controller will hang.
  Fix: Send the error response back via `let _ = reply.send(Err(...));` before returning.

---
*Reviewed by claude-sonnet-4-6 · diff: 234 lines · context: 812 lines · tokens: 4,219 in / 287 out · cost: $0.0167 · tag `@claude` to ask follow-up questions*

Test plan

  • Confirm ANTHROPIC_API_KEY is present in repo Actions secrets (already required for the existing workflows — same secret name).
  • Open this PR and verify claude-code-review.yml posts a review comment within ~1 minute of the push.
  • Tag @claude in a comment on this PR and verify claude.yml responds.
  • (optional) Push a 4000+ line diff somewhere and verify the cost guard fires with the skip-notice comment.

Notes

  • This PR replaces, doesn't add. Same two filenames, same secret. No new infrastructure.
  • The workflow scope is required to land changes under .github/workflows/ — already on this token.
  • Companion PR (skills + agent docs) lands separately.

@otabek05 Can you confirm these changes ?

@Sardor-M Sardor-M merged commit 7328d45 into main May 11, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants